feat: add Codex parity, discuss-phase scouting, and agent quality guards#811
feat: add Codex parity, discuss-phase scouting, and agent quality guards#811Tibsfox wants to merge 3 commits intogsd-build:mainfrom
Conversation
…agent role generation Expand Codex adapter with AskUserQuestion → request_user_input parameter mapping (including multiSelect workaround and Execute mode fallback) and Task() → spawn_agent mapping (parallel fan-out, result parsing). Add convertClaudeAgentToCodexAgent() that generates <codex_agent_role> headers with role/tools/purpose and cleans agent frontmatter. Generate config.toml with [features] (multi_agent, request_user_input) and [agents.gsd-*] role sections pointing to per-agent .toml configs with sandbox_mode (workspace-write/read-only) and developer_instructions. Config merge handles 3 cases: new file, existing with GSD marker (truncate + re-append), existing without marker (inject features + append agents). Uninstall strips all GSD content including injected feature keys while preserving user settings. Closes gsd-build#779 Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add lightweight codebase scanning before gray area identification: - New scout_codebase step checks for existing maps or does targeted grep - Gray areas annotated with code context (existing components, patterns) - Discussion options informed by what already exists in the codebase - Context7 integration for library-specific questions - CONTEXT.md template includes code_context section Co-authored-by: Claude Opus 4.6 <[email protected]>
…nd task-level TDD (gsd-build#736) - gsd-executor: Add <analysis_paralysis_guard> block after deviation_rules. If executor makes 5+ consecutive Read/Grep/Glob calls without any Edit/Write/Bash action, it must stop and either write or report blocked. Prevents infinite analysis loops that stall execution. - gsd-plan-checker: Add exhaustive cross-check in Step 4 requirement coverage. Checker now also reads PROJECT.md requirements (not just phase goal) to verify no relevant requirement is silently dropped. Unmapped requirements become automatic blockers listed explicitly in issues. - gsd-planner: Add task-level TDD guidance alongside existing TDD Detection. For code-producing tasks in standard plans, tdd="true" + <behavior> block makes test expectations explicit before implementation. Complements the existing dedicated TDD plan approach — both can coexist. Co-authored-by: CyPack <GITHUB_EMAIL_ADRESIN> Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Code Logic Verification of PR #4PR: feat: add Codex parity, discuss-phase scouting, and agent quality guards 1. The Before State (Bug Paths)Codex gap: The Codex runtime adapter had a minimal 6-line skill header with no Blind discuss-phase: The discuss-phase generated gray areas purely from the ROADMAP.md phase description without examining the actual codebase. It could not suggest reusing existing components, highlight established patterns, or annotate options with code context — leading to decisions that ignored what was already built. Agent execution drift: Three recurring failure modes:
2. The After State (Fix Paths)Commit 1 — Full Codex adapter expansion with 8 new functions/constants:
30 tests across 8 suites covering all functions, edge cases (no frontmatter, unknown agents, idempotency, GSD-only cleanup), and one integration test with real agent files. Commit 2 — discuss-phase (+77 lines workflow, +14 lines template, +13 lines command) New
Commit 3 — 3 agent files (+32 lines)
3. Key Correctness Checks
4. VerdictPASS — All three commits implement their stated objectives correctly. The Codex adapter is comprehensive (8 functions, 30 passing tests, clean install/uninstall paths with idempotency). The discuss-phase scouting adds genuine value by grounding gray areas in actual codebase state. The agent quality guards address real failure modes with well-designed escape hatches. All 3 commits confirmed present on main via cherry-pick (commits Noted concerns (non-blocking):
Upstream: gsd-build/get-shit-done#811 — open, CONFLICTING. All changes already on upstream main. Recommend closing both PRs. Verification performed using parallel code trace and adversarial review teams. Codex adapter tests independently executed and confirmed 30/30 passing. - Tibsfox ^.^ |
Summary
Three feature enhancements that improve Codex runtime parity, discuss-phase intelligence, and agent execution quality:
request_user_inputmapping in skill adapter,config.tomlgeneration with per-agent.tomlfiles, agent role headers, sandbox mode assignment, and clean uninstall support<code_context>section in CONTEXT.md template<behavior>blocks (gsd-planner)Motivation
Codex parity gap: The Codex runtime adapter had a minimal skill header with no
AskUserQuestionmapping and no multi-agent configuration. GSD workflows that rely on interactive questioning (discuss-phase) or agent spawning (execute-phase) could not function on Codex. Agent.mdfiles were converted with basic markdown transforms but lacked the<codex_agent_role>header and per-agent.tomlsandbox configs that Codex requires for proper isolation.Blind discuss-phase: The discuss-phase workflow generated gray areas purely from the ROADMAP.md phase description without examining the actual codebase. This meant it could not suggest reusing existing components, highlight established patterns, or annotate options with code context -- leading to decisions that ignored what was already built.
Agent execution drift: Three recurring failure modes in production:
Changes
Commit 1:
bf26f95-- Codex request_user_input, multi-agent config, agent role generationbin/install.js(+298 lines):CODEX_AGENT_SANDBOXmap: 11 agents with sandbox modes (9 workspace-write, 2 read-only)getCodexSkillAdapterHeader(): Expanded from 6-line stub to three structured sections:$skillName,{{GSD_ARGS}})AskUserQuestiontorequest_user_inputparameter mapping (header, question, options, multiSelect workaround, Execute mode fallback)Task()tospawn_agentmapping (agent_type, fork_context, parallel wait pattern, result markers, close_agent)convertClaudeAgentToCodexAgent(): Adds<codex_agent_role>header with role/tools/purpose, cleans frontmatter (drops color/tools fields, quotes name/description)generateCodexAgentToml(): Per-agent.tomlwithsandbox_modeanddeveloper_instructionsfrom agent bodygenerateCodexConfigBlock(): Generates[features](multi_agent, default_mode_request_user_input) and[agents](max_threads=4, max_depth=2) with per-agent sections referencing.tomlconfig filesstripGsdFromCodexConfig(): Clean removal of GSD sections during uninstall (handles marker-based, injected keys, and[agents.gsd-*]sections)mergeCodexConfig(): Three-case merge (new file, existing with marker, existing without marker with feature injection)installCodexConfig(): Orchestrates agent discovery,.tomlgeneration, and config mergeGSD_TEST_MODEenv var) for module-level testing without CLI side effects.tomlfiles and cleansconfig.tomltests/codex-config.test.cjs(+412 lines, 30 tests across 8 suites):getCodexSkillAdapterHeader: Section presence, invocation syntax, parameter mapping, spawn_agent mappingconvertClaudeAgentToCodexAgent: Frontmatter cleanup, slash command conversion, no-frontmatter passthroughgenerateCodexAgentToml: Sandbox mode assignment (workspace-write, read-only, default), developer_instructions embeddingCODEX_AGENT_SANDBOX: Agent count (11), write/read-only classificationgenerateCodexConfigBlock: Marker, feature flags, agent limits, per-agent sectionsstripGsdFromCodexConfig: GSD-only removal, user content preservation, injected key stripping, empty section cleanup,[agents.gsd-*]removalmergeCodexConfig: Three cases + idempotency + existing[features]injectioninstallCodexConfig(integration): End-to-end with real agent filesCommit 2:
0dc8120-- Code-aware discuss phase with codebase scouting (#727)commands/gsd/discuss-phase.md:GlobandGrepto allowed-tools (needed for codebase scouting)Taskandmcp__context7__*to allowed-tools for auto-advance and library documentation lookupget-shit-done/workflows/discuss-phase.md(+77 lines):<step name="scout_codebase">between check_existing and analyze_phase:.planning/codebase/*.mdmaps first (CONVENTIONS, STRUCTURE, STACK)<codebase_context>(reusable assets, established patterns, integration points, creative options)analyze_phaseto use codebase_context for grounded analysispresent_gray_areaswith code context annotation examplesdiscuss_areaswith code-context-annotated option examples and Context7 library lookupwrite_contextto include<code_context>section (reusable assets, established patterns, integration points)get-shit-done/templates/context.md(+14 lines):<code_context>section template with Reusable Assets, Established Patterns, and Integration Points subsectionsCommit 3:
9124906-- Analysis paralysis guard, exhaustive cross-check, task-level TDD (#736)agents/gsd-executor.md(+10 lines):<analysis_paralysis_guard>section: After 5+ consecutive Read/Grep/Glob calls without Edit/Write/Bash, executor must stop and either write code or report "blocked" with the specific missing informationagents/gsd-plan-checker.md(+2 lines):agents/gsd-planner.md(+20 lines):tdd="true"and<behavior>block with explicit test expectations<behavior>elementRelationship to Other PRs
This is PR #4 of 6 from the dev-bugfix branch:
PR #5 dependency: PR #5 adds agent frontmatter parsing improvements that build on the agent definitions modified here (gsd-executor, gsd-planner, gsd-plan-checker). No code conflicts, but they touch overlapping agent files. Recommend merging #4 before #5.
Testing
New Tests (codex-config.test.cjs)
getCodexSkillAdapterHeaderconvertClaudeAgentToCodexAgentgenerateCodexAgentTomlCODEX_AGENT_SANDBOXgenerateCodexConfigBlockstripGsdFromCodexConfigmergeCodexConfiginstallCodexConfig(integration).mdfilesFull Suite Results
All 449 tests pass (30 new + 419 existing). Zero failures, zero skipped.
Impact
request_user_input, agent spawning works viaspawn_agent, and each agent gets proper sandbox isolation through generated.tomlconfigs